Skip to content

Conversation

@dhardy
Copy link
Member

@dhardy dhardy commented Aug 12, 2025

Add additional tests from #1642 by @hpenne.

By merging these first we get to run them through CI over rand_chacha and review separately.

CC @tarcieri

@dhardy dhardy requested a review from newpavlov August 12, 2025 14:06
@dhardy
Copy link
Member Author

dhardy commented Sep 3, 2025

I completely replaced the vectors:

I also directly tested state after RNG construction. This is specific to rand_chacha but should be easy to carry over to chacha20. Tests pass on x86_64 and i686.

src/rngs/std.rs Outdated
Comment on lines 309 to 344
let seed = [0u8; 32];
let mut rng = StdRng::from_seed(seed);
let block = u32::MAX;
let words_per_block = 16;
rng.0.set_word_pos((block as u128) * words_per_block);

let mut results = [0u8; 64];
rng.fill_bytes(&mut results);
#[rustfmt::skip]
let expected = [
0xd7, 0xa6, 0xaf, 0x50, 0xf1, 0xc9, 0x2a, 0x29,
0x48, 0x42, 0x52, 0xbb, 0xfc, 0xe2, 0x06, 0xf1,
0x7d, 0x01, 0xdd, 0x13, 0x95, 0x30, 0xa3, 0x83,
0x0a, 0xb5, 0x83, 0xc1, 0xf6, 0x2e, 0x03, 0x12,
0x82, 0x93, 0x61, 0xa1, 0x9a, 0x8a, 0x95, 0x6c,
0xed, 0xea, 0x38, 0x04, 0x30, 0xff, 0x93, 0x2c,
0xd0, 0x52, 0xdb, 0x5e, 0x94, 0x77, 0x83, 0x50,
0x58, 0xb8, 0x0a, 0x27, 0x24, 0x06, 0xfc, 0x74,
];
assert_eq!(results, expected);

rng.fill_bytes(&mut results);
#[rustfmt::skip]
let expected = [
0xcc, 0x7b, 0x53, 0xdc, 0x11, 0x89, 0x4d, 0x26,
0x24, 0x05, 0x81, 0xb8, 0xa8, 0xf4, 0xf4, 0xe5,
0xaf, 0x40, 0x67, 0x05, 0x80, 0x12, 0x23, 0xb1,
0x3f, 0x82, 0x1f, 0xdc, 0xcb, 0xa6, 0xa6, 0x18,
0x8a, 0x63, 0xf8, 0xd3, 0xdc, 0x83, 0xcc, 0xbc,
0xed, 0x45, 0x1f, 0x4b, 0xa4, 0xe0, 0xda, 0xab,
0x22, 0x8a, 0xbb, 0x0d, 0x74, 0x39, 0xcc, 0x67,
0xe5, 0x0d, 0xf7, 0x12, 0x9f, 0x64, 0x6b, 0xad,
];
assert_eq!(results, expected);

assert_eq!(rng.0.get_word_pos(), (block as u128) * words_per_block + 32);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When generating test vectors that make use of an overflowing counter, I personally think it would be best to test at least 4 blocks of output. That way, if one of the backends has 32-bit counter addition on any of the 4 blocks generated in parallel, it will be caught. The 4-block test won't catch anything on chacha20 or rand_chacha, but it might catch a bug when/if we ever add a new backend for chacha20. For example, only testing with 3 blocks of output will not catch this bug, but testing 4 blocks will catch this bug:

    unsafe fn write_par_ks_blocks(&mut self, buffer: &mut [u32; 64]) {
        let mut blocks = [
            [self.state[0], self.state[1], self.state[2], self.state[3]],
            [
                self.state[0],
                self.state[1],
                self.state[2],
                add_counter!(self.state[3], self.ctrs[0], V), // correct addition
            ],
            [
                self.state[0],
                self.state[1],
                self.state[2],
                add_counter!(self.state[3], self.ctrs[1], V), // correct addition
            ],
            [
                self.state[0],
                self.state[1],
                self.state[2],
                vaddq_u32(self.state[3], self.ctrs[2]), // simple example of incorrect addition
            ],
        ];
    ...

While this mistake is unlikely to happen, it could happen and we might not have another test that will catch it. We would only need to increase the amount of blocks tested beyond 4 blocks in the future if we increased the results buffer size, which is pretty unlikely given that even with avx512, we would still be able to generate 4 blocks in parallel. But keep in mind that at the absolute minimum, 2 blocks need to be tested in order to actually observe the counter overflowing, but 4 blocks will test each block of the parallel-y generated blocks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I do notice that the current write_par_ks_blocks could be rewritten to use strictly 64 bit addition, but ideally the code would be condensed into a fn called rounds() that would use add_counter!() because any cipher or RNG could be able to use the rounds() fn

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I will add that you can still use only 2 blocks of expected output, but use different starting block_poses. For example, start at u32::MAX then check the first two blocks. Switch to u32::MAX - 1, then skip the first block and check the next 2 blocks, then switch to u32::MAX - 2 and skip the first two blocks and check the next 2 blocks. And that should cover all of the parblocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nstilt1. I opted to test six blocks (one before wrap, four during, one after). I think this suffices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further note: these test vectors are not exactly in the right place! StdRng does not support setting the IV/stream. It does not support directly setting the counter, though obviously one could generate 2^32 + 1 blocks.

Copy link

@nstilt1 nstilt1 Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, the parblocks counter addition tests could probably only be ran in chacha20 given that there are multiple backends, where a new one could be added at any time (unless rand's CI/CD tests are going to use all of the backends. If rand CI/CD tests don't use all of the backends then the test only confirms that one of the backends is working properly). Whereas chacha20 will most likely test all backends it has implemented, since it needs to test them

Copy link

@nstilt1 nstilt1 Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested it out by mutating the neon backend, and the counter_wrapping_64_bit_counter test will fail if the fourth PARBLOCK uses 32-bit addition, so I am unsure if an extra PARBLOCKS counter test is required... it probably isn't

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these may be a little excessive here, but I don't see the harm in keeping them either (they're hardly slow and I'm not too fussed about the line count, though much more and it would have to be in a separate test).

@dhardy
Copy link
Member Author

dhardy commented Sep 4, 2025

Updated: rebased, u128 form to be less verbose, adjusted comments and extended the counter test.

@dhardy dhardy merged commit d28a658 into master Sep 8, 2025
16 checks passed
@dhardy dhardy deleted the push-nqommrnzsqkq branch September 8, 2025 07:37
@dhardy dhardy mentioned this pull request Sep 8, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants